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BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates in general to signal coding and more particularly, 
to variable bit rate speech coding. 

5 2. Background 

Speech coding is traditionally driven by bandwidth considerations and 
efficiency. As a result, modem communication systems typically implement various 
speech coding and compression techniques to reduce requirements on bandwidth and 
to achieve higher transmission efficiency. 

10 One typical scheme for providing speech coding is a technique called Pulse 

Code Modulation ("PCM") that is used for converting speech signals into digital form 
and is widely used by the telephone companies in their Tl circuits. Every minute of 
the day, millions of telephone conversations, as well as data transmissions via 
modems, are converted into digital via PCM for transport over high-speed intercity 

15 trunks. PCM samples the analog waves 8,000 times per second and converts each 
sample into an 8-bit number, resulting in a 64 kbps data stream. In fact, the PCM 
technique has been adopted by the International Telecommunication Union ("ITU") 
under G.71 1 standard which defines a single rate coding method at 64 kbps. 

Another technique adopted by the ITU utiUzes a method called Adaptive 
20 Differential PCM ("ADPCM") that converts analog sound, such as speech, into 
digital. Using this technique, in lieu of coding an absolute measurement at each 
sample point, the difference between samples is coded. ADPCM can dynamically 
switch the coding scale to compensate for variations in amplitude. The ITU standards 
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that have utilized this technique include G.721 (32 kbps), G.722 (64 kbps), G.723 (20 
kbps and 40 kbps), G.726 (16 kbps, 24 kbps, 32 kbps and 40 kbps) and G.727 (16 
kbps, 24 kbps, 32 kbps and 40 kbps). 

A more recent ITU standard has adopted the Code Excited Linear Prediction 
5 Technique ("CELP") in G.729 family, the main body and Annex A (8kbps), Annex B 
(0 kbps and 1.5kbps), Annex D (6.4kbps), Annex E (1 1.2kbps), and Annex I (0 kbps, 
1.5 kbps, 6.4 kbps, 8 kbps and 1 1.2 kbps) that achieves high compression ratios along 
with toll quality narrow-band (telephone band) audio. A similar method has also been 
utilized in G.723. 1 (5.3 kbps and 6.4 kbps). And a method called Low-Delay CELP 
10 ("LD-CELP") has been used in G.728 (16 kbps) standards and provides near toll 

quality audio by using a smaller sample size that is processed faster, resulting in lower 
delays. 

As noted above, G.723, G.726, G.727, G.729 Annex I and G.723. 1 standards 
define a multi-rate capability for speech data transfer. Today, these multi-rates have 

15 been taken advantage of by the network providers, such as AT&T, MCI or Sprint, 
which control data bit rates according to predetermined factors, such as time of the 
day or particular usage of the network. For example, the network providers may 
decide to save network bandwidth during business hours and limit the data bit rate to 
6.4 kbps. After business hours, however, the network providers may increase the data 

20 bit rate to 1 1.2 kbps. Yet, the network providers may allocate certain lines for high 
quality speech data transfer during specific hours. 

Figiu-e 1 illustrates a typical system 100 used by the network providers for 
implementing the above schemes. As shown, system 100 includes a plurality of 
speech encoders 1,2, n, enumerated as modules 130, 140, 150, respectively. In 
25 one embodiment, system 100 may be ITU G.729 Annex I compatible and speech 
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encoder 130 may encode at 6.4 kbps, speech encoder 140 may encode at 8.0 kbps and 
speech encoder 150 may encode at 1 1.2 kbps. 

As shown in Figure 1, encoder selector 1 12 is positioned by the network 
controller 120. As stated above, the selector 1 12 is positioned in accordance with 
5 predetermined factors under the network provider control. For example, the network 
controller 120 may decide to use the speech encoder 150 at data bit rate of 1 1.2 kbps 
after business hours or from 2:00 p.m. to 4:00 p.m. when communication channel 160 
is utilized for music broadcast which requires high data rates to preserve the speech 
quality. On the other hand, the network controller 120 may position the encoder 
10 selector 1 12 so as to select the speech encoder 130 at data bit rate of 6.4 kbps for 
voice communications from 4:00 p.m. to 8:00 p.m. 

While such traditional multi-rate speech encoders have been successfully 
implemented in digital communication systems, they are restricted in use and 
application. Such systems are disadvantageous and inflexible, since data bit rates are 

15 set based on predetermined factors that may or may not hold true. As a result, too . 
little or too much of the network bandwidth may be used for a given speech. For 
example, high quality speech, such as music, may be transmitted on a communication 
channel selected to transmit at low date rates, and thus, causing degradation in the 
quaUty. On the other hand, a high data rate communication channel may be wasted if 

20 only low quality speech, such as voice which does not require a high bandwidth, is 
transmitted. 

Accordingly, there is an intense need in the technology for a flexible speech 
encoder that can efficiently utilize the bandwidth of a given conmiunication channel. 
Furthermore, there is a strong need in the industry for a speech encoder system that 
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can combine various speech encoding schemes while maintaining interoperability with 
the exiting speech decoders and standards. 
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SUMMARY OF THE INVENTION 

In accordance with the purpose of the present invention as broadly described 
herein, there is provided method and system for rate determination coding. 
5 In one embodiment, the present invention includes a data rate determinator and 

a plurality of data signal encoders. The data rate determinator determines the data rate 
for the data signal and selects one of the data signal encoders based on the determined 
data rate and encodes the data signal accordingly. 

In another embodiment, the system includes a plurality of speech encoders, a 
10 network controller capable of selecting at least two of the speech encoders and a data 
rate determinator capable of determining the data rate of the speech signal and 
selecting, according to the data rate, one of the speech encoders selected by the 
network controller. 

In one aspect of the present invention, the data or speech signal includes a 
15 number of frames and the data rate determinator determines the data rate of each of 
the frames and selects one of the encoders based on the data rate of each frame. The 
signal is then encoded frame-by-frame. In another aspect of the present invention, 
different encoding standards may be utilized for encoding various frames of the 
signal. 

20 Other aspects of the present invention will become apparent with further 

reference to the drawings and specification, which follow. 
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BRIEF DESCRIPTION OF THE DRAWINGS 


The features and advantages of the present invention will become more readily 
apparent to those ordinarily skilled in the art after reviewing the following detailed 
description and accompanying drawings, wherein: 


Figure 2 illustrates one embodiment of a speech encoding system of the present 
invention. 

Figure 3 illustrates an example input signal of Figure 2. 

Figure 4 illustrates another embodiment of a speech encoding system of the 
10 present invention. 
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Figure 1 illustrates a conventional speech encoding system. 
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DETAILED DESCRIPTION OF THE INVENTION 

An embodiment of the present invention is shown in Figure 2. As shown, 
speech encoding system 200 includes speech encoders L . .n. In one embodiment, the 
speech encoders 1 . . .n may support a subset or a complete set of speech coding data 
5 rates of a single standard. In this particular example, however, the speech encoders 
(1 ..3) 230, 240 and 250, respectively, may support data bit rates of 6.4, 8.0 and 1 1 .2 
kbps of the G.729 Annex I standard, respectively. In another embodiment, speech 
encoding system 200 may include five speech encoders for supporting all data bit rates 
defined under G.729 Annex I standard. In yet another embodiment, each speech 
10 encoder may support a different standard. For example, the speech encoder 230 may 
support the G.721 ADPCM standard at 32 kbps, the speech encoder 240 may support 
the G.723.1 standard at 5.3 kbps and the speech encoder 250 may support the G.729 
Annex I standard at 1 1.2 kbps. 

As shown, speech signal 210 enters the encoding system 200 for transmission 
15 over communication channel 260. A "conmiunication channel" refers to the medium 
or channel of communication. The communication channel may include, but is not 
limited to, a telephone line, a modem connection, an Internet connection, an Integrated 
Services Digital Network ("ISDN") connection, an Asynchronous Transfer Mode 
(ATM) connection, a frame relay connection, an Ethernet connection, a coaxial 
20 connection, a fiber optic connection, satellite connections (e.g. Digital Satellite 
Services, etc.), wireless connections, radio frequency (RF) links, electromagnetic 
links, two way paging connections, etc., and combinations thereof. 

In accordance with the practices of persons skilled in the art of computer 
programming, the present invention is described below with reference to symbolic 
25 representations of operations that are performed by the system 200 (Figure 2) and/or 
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the system 400 (Figure 4), unless indicated otherwise. Such operations are sometimes 
referred to as being computer-executed. It will be appreciated that operations that are 
symbolically represented include the manipulation by a processor of electrical signals 
representing data bits and the maintenance of data bits at memory locations in system 
5 memory (not shown), as well as other processing of signals. The memory locations 
where data bits are maintained are physical locations that have particular electrical, 
magnetic, optical, or organic properties corresponding to the data bits. 

When implemented in software, the elements of the present invention are 
essentially the code segments to perform the necessary tasks. The program or code 

10 segments can be stored in a processor readable medium or transmitted by a data signal 
embodied in a carrier wave over a transmission medium or communication link. The 
"processor readable medium" may include any medium that can store or transfer 
information. Examples of the processor readable medium include an electronic 
circuit, a semiconductor memory device, a ROM, a flash memory, an erasable ROM 

15 (EROM), a floppy diskette, a CD-ROM, an optical disk, a hard disk, a fiber optic 
medium, a radio frequency (RF) Unk, etc. The computer data signal may include any 
signal that can propagate over a transmission medium such as electronic network 
channels, optical fibers, air, electromagnetic, RF links, etc. The code segments may 
be downloaded via computer networks such as the Internet, Intranet, etc. 

^ Returning tfVv]Pl ^p 1 , thf gp^"^^ signnl 770 is routed to a rate determination 

(X)ntroller module 22Crfor analyzing the speech signal on frame-by-frame basis. Each 
frame of speech is analyzeHJ^y the rate determination controller 220 in order to select 
one of the speech encoders 230^50, for the most efficient use of the communication 
channel 260. As understood by tnbse of ordinary skill in the art, for example, frames 

25 of speech are sampled at 10 ms interv^or blocks under the G.729 standard. An 
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^^A^^ of each rO ms frame of speech, using well-known methods, the rate 

. / detemiination controller 220 may select one of the plurality of speech encoders 230, 
240 and 250. 

Ly \\Zp^ For example, if the speech signal has the shape or characteristics of a male 
5 /voice, the rate determination controller 220 may position the encoder selector 212 to 
select a medium \lata rate speech encoder, such as the speech encoder 230, G.729 6.4 
kbps, to encode that particular frame. For the next frame, however, if the rate 
determination controller 220 finds a higher quality speech frame, such as music-like 
^ speech, the rate determination controller 220 may position the ( ^uc o d ei i ^clcc tu i 21 5 'to 
10 select a high data rateWcoder, such as the speech encoder 250, G.729 1 1.2 kbps, to 
encode that speech frame in order to prevent quality degradation. In one embodiment, 
the speech encoder 250 W the system 200 may be a G.727 ADPCM 24.0 kbps, in that 
event, positioning the encbder selector 212 to the speech encoder 250 by the rate 
determination controller 23p would cause the speech frame be encoded using the 
15 G.727 standard. 

It should be noted that according to one embodiment of the present invention, 
various numbers of speech encoders of different standards may be included in the 
speech encoding system 200. Such embodiment, of course, requires a complementary 
speech decoding system that can support these various speech encoders in order to 
20 decode the speech on a frame-by-frame basis. 

However, in some embodiments, the speech encoding system 200 may encode 
the speech frames using various speech encoders belonging to a single standard, such 
as G.729 Annex I. Such systems are advantageous since they require no change to the 
conventional decoding systems. 
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The rate determination controller 220 may be implemented as hardware, 
firmware or software, or any combination thereof. The resulting bit stream from each 
of the speech encoder 230, 240 and 250 is provided to a communication channel 260. 


5/ controller 220 on a frame-by-frame basis. Once the speech signal 210 is routed to 
ene-e^ c rate d&t or mina tio n controller 2 50, a predetermined flag in the header of the 
speech frame is analyzed to determine classification of the speech frame. For 
example, the value of the flag in the speech frame may indicate that the speech frame 
is a non-active speech signal (background noise or silence) and thus is to be processed 
10 by a low bit rate encoder. The value of the flag in the speech frame may indicate that 
the speech frame is an active speech and of high quality, such as music, and is thus to 
be processed using a high bit rate encoder. In the alternative. The value of the flag in 
the speech frame may indicate that the speech frame is an active speech but of 
medium quality, such as male voice, and is thus to be processed using a medium bit 
15 rate encoder. Once the encoding scheme is determined, the speech frame is routed to 
one of the speech encoders l..n via the encoder selector 212. It is understood that 
classification of the input speech may be accomplished by any type of control circuit 
or software, based on a predetermined standard, criterion or set of criteria, or based on 
system requirements and/or need. 

20 Turning to Figure 3, a speech signal diagram 300 is shown. Figure 3 illustrates 

a speech signal 330 mapped into amplitude 310 / time 320 axis. The speech signal 
330 is broken down into blocks of time as denoted by vertical dotted Unes. Each 
block of time a-v, on the time line 340, represents one frame of speech. As stated 
above, one frame of speech is, for example, 10 ms in duration per G.729 ITU 

25 standard, or in some embodiments, frames are in 5 ms intervals. Referring back to 



As described above, speech signal 210 is first routed to the rate determination 


10 



Docket No.: 00CON107P 

Figure 2 and assuming the speech encoders 230, 240 and 250 are G.729 1.5 kbps, 
G.729 8.0 kbps and G.726 32.0 kbps, respectively, when the speech frame (a) of 
speech signal 330 enters the encoding system of 200, the rate determination controller 
220 first determines the type speech in speech frame (a) based on well-known 

5 methods known to those of ordinary skill in the art. As shown, speech frame (a) is 
low quality speech or background noise and thus the rate determination controller 220 
may position the encoder selector 212 to select a low data rate speech encoder, such as 
the speech encoder 230 at 1.5 kbps, to encode speech frame (a). As for the next 
speech frame (b), the rate determination controller 220 may retain the same position 

0 for the encoder selector 212. However, for the speech frames (c) and (f), the rate 
determination controller 220 may select a medium data rate, such as the speech 
encoder 240 at 8.0 kbps. As for speech frames (h), (i), (1) and (m), the rate 
determination controller 220 may select a high data rate speech encoder, such as the 
speech encoder 250 at 32.0 kbps, to preserve the quality of speech. 

Figure 4 illustrates another embodiment of the present invention. As shown, 
the speech encooing system 400 includes a network controller 430, a rate 
determination confroUer 420 and a plurality of speech encoders l..n, denoted 440, 450, 
460, 470 and 480, refepectively, for transmitting speech signal 410 over a 
V ■ communication channU 460 . According to this embodiment, the network controller 
20 430 may select one of aVlurality of groups of speech encoders for encoding the 

speech signal 410. The network controller 430 may route the speech signal 410 either 
through line 412 or 414 according to predetermined factors of the network provider. 
As shown, Une 412 routes theV)eech signal 410 to a first group of encoders, including 
speech encoders 440, 460 and 4810. Line 414, on the other hand, routes the speech 
25 signal 410 to a second group of spfeech encoders, including speech encoders, 440, 450, 
460, 470 and 480. In one embodimeijt, the speech encoders 440, 450, 460, 470 and 

11 
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480 may support different data rates of G.729 Annex I, 0, 1.5, 6.0, 8.0 and 1 1.2 kbps, 
respectively. In another embodiment, the speech encoder 440 may support 0 kbps data 
rate of the G.729 Annex I standard, the speech encoder 450 may support 5.3 kbps of 
the G.723.1 standard, tne speech encoder 460 may support 8.0 kbps data rate of the 
5 G.729 Annex I standard, the speech encoder 470 may support 16.0 kbps data rate of 
the G.728 standard and me speech encoder 480 may support 64.0 kbps data rate of the 
G.71 1 standard. In short, various data rates of different standards may be combined 
and supported accordingly. 

£0 Just as\f xplained above in relation to the embodiment of Figure 2, the rate 

10 / determination controller 420 may route each frame of the speech signal 410 using 
encoder selectorsVlB and 415 to one of plurality of the speech encoders according to 
characteristics of each speech frame. However, the network controller 430 may 
designate a specific group of speech encoders that may be utilized by the rate 
determination controller 420. For example, during certain hours of the day, the 
15 network controller 430 may route the speech signal through the line 412 to the 
^ encoder selector 413 whioh provides less number of "speech encoder to choose from 
for use by the rate determination controller 420. 

The present invention thus provides an apparatus and method for providing 
flexible variable bit rate encoding. The flexible encoding scheme facilitates encoding 

20 of speech using any desired standard, criteria or fixed rate-bit encoders. In one 
embodiment, the speech encoders 440-480 may be existing fixed bit-rate encoders, 
such as GSM EFR (enhanced Full-Rate), IS-641 (TIA/EIA TDMA standard), etc., or 
in yet other embodiments, the speech encoders 440-480 may include single multi-rate 
standards, such as GSM AMR (adaptive multi-rate), or any combinations of the 

25 above. 
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At any given time interval, speech may be encoded using one or a plurality of 
standards and/or criteria. The encoding system of the invention may interface with a 
decoding system based on existing standards. Alternatively, it may interface with a 
decoding system implemented using new standards or a decoding system with a 

5 combination of existing and new standards. In this manner, the invention provides 
flexibility in choice of standards, bandwidth requirements or quality of service, while 
enabling use with existing systems and/or new systems. Existing decoding systems 
may interface with the encoding system of the invention without any change or 
alteration. At the same time, the encoding system may accommodate the use of new 

10 standards while providing flexibility of choice. 

The present invention may be embodied in other specific forms without 
departing from its spirit or essential characteristics. The described embodiments are 
to be considered in all respects only as illustrative and not restrictive. The scope of 
the invention is, therefore, indicated by the appended claims rather than the foregoing 
15 description. All changes which come within the meaning and range of equivalency of 
the claims are to be embraced within their scope. 
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